195 research outputs found
Efficient GPU-accelerated fitting of observational health-scaled stratified and time-varying Cox models
The Cox proportional hazards model stands as a widely-used semi-parametric
approach for survival analysis in medical research and many other fields.
Numerous extensions of the Cox model have further expanded its versatility.
Statistical computing challenges arise, however, when applying many of these
extensions with the increasing complexity and volume of modern observational
health datasets. To address these challenges, we demonstrate how to employ
massive parallelization through graphics processing units (GPU) to enhance the
scalability of the stratified Cox model, the Cox model with time-varying
covariates, and the Cox model with time-varying coefficients. First we
establish how the Cox model with time-varying coefficients can be transformed
into the Cox model with time-varying covariates when using discrete
time-to-event data. We then demonstrate how to recast both of these into a
stratified Cox model and identify their shared computational bottleneck that
results when evaluating the now segmented partial likelihood and its gradient
with respect to regression coefficients at scale. These computations mirror a
highly transformed segmented scan operation. While this bottleneck is not an
immediately obvious target for multi-core parallelization, we convert it into
an un-segmented operation to leverage the efficient many-core parallel scan
algorithm. Our massively parallel implementation significantly accelerates
model fitting on large-scale and high-dimensional Cox models with
stratification or time-varying effect, delivering an order of magnitude speedup
over traditional central processing unit-based implementations
Massive Parallelization of Massive Sample-size Survival Analysis
Large-scale observational health databases are increasingly popular for
conducting comparative effectiveness and safety studies of medical products.
However, increasing number of patients poses computational challenges when
fitting survival regression models in such studies. In this paper, we use
graphics processing units (GPUs) to parallelize the computational bottlenecks
of massive sample-size survival analyses. Specifically, we develop and apply
time- and memory-efficient single-pass parallel scan algorithms for Cox
proportional hazards models and forward-backward parallel scan algorithms for
Fine-Gray models for analysis with and without a competing risk using a cyclic
coordinate descent optimization approach We demonstrate that GPUs accelerate
the computation of fitting these complex models in large databases by
orders-of-magnitude as compared to traditional multi-core CPU parallelism. Our
implementation enables efficient large-scale observational studies involving
millions of patients and thousands of patient characteristics
Anni 2.0: a multipurpose text-mining tool for the life sciences
Anni 2.0 provides an ontology-based interface to MEDLINE
Rewriting and suppressing UMLS terms for improved biomedical term identification
<p>Abstract</p> <p>Background</p> <p>Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule.</p> <p>Results</p> <p>Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus.</p> <p>Conclusions</p> <p>We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at <url>http://biosemantics.org/casper</url>.</p
Recommended from our members
Comparison of First-Line Dual Combination Treatments in Hypertension: Real-World Evidence from Multinational Heterogeneous Cohorts.
Background and objectives: 2018 ESC/ESH Hypertension guideline recommends 2-drug combination as initial anti-hypertensive therapy. However, real-world evidence for effectiveness of recommended regimens remains limited. We aimed to compare the effectiveness of first-line anti-hypertensive treatment combining 2 out of the following classes: angiotensin-converting enzyme (ACE) inhibitors/angiotensin-receptor blocker (A), calcium channel blocker (C), and thiazide-type diuretics (D).Methods: Treatment-naïve hypertensive adults without cardiovascular disease (CVD) who initiated dual anti-hypertensive medications were identified in 5 databases from US and Korea. The patients were matched for each comparison set by large-scale propensity score matching. Primary endpoint was all-cause mortality. Myocardial infarction, heart failure, stroke, and major adverse cardiac and cerebrovascular events as a composite outcome comprised the secondary measure.Results: A total of 987,983 patients met the eligibility criteria. After matching, 222,686, 32,344, and 38,513 patients were allocated to A+C vs. A+D, C+D vs. A+C, and C+D vs. A+D comparison, respectively. There was no significant difference in the mortality during total of 1,806,077 person-years: A+C vs. A+D (hazard ratio [HR], 1.08; 95% confidence interval [CI], 0.97-1.20; p=0.127), C+D vs. A+C (HR, 0.93; 95% CI, 0.87-1.01; p=0.067), and C+D vs. A+D (HR, 1.18; 95% CI, 0.95-1.47; p=0.104). A+C was associated with a slightly higher risk of heart failure (HR, 1.09; 95% CI, 1.01-1.18; p=0.040) and stroke (HR, 1.08; 95% CI, 1.01-1.17; p=0.040) than A+D.Conclusions: There was no significant difference in mortality among A+C, A+D, and C+D combination treatment in patients without previous CVD. This finding was consistent across multi-national heterogeneous cohorts in real-world practice
Using the data quality dashboard to improve the ehden network
Federated networks of observational health databases have the potential to be a rich resource to inform clinical practice and regulatory decision making. However, the lack of standard data quality processes makes it difficult to know if these data are research ready. The EHDEN COVID-19 Rapid Collaboration Call presented the opportunity to assess how the newly developed open-source tool Data Quality Dashboard (DQD) informs the quality of data in a federated network. Fifteen Data Partners (DPs) from 10 different countries worked with the EHDEN taskforce to map their data to the OMOP CDM. Throughout the process at least two DQD results were collected and compared for each DP. All DPs showed an improvement in their data quality between the first and last run of the DQD. The DQD excelled at helping DPs identify and fix conformance issues but showed less of an impact on completeness and plausibility checks. This is the first study to apply the DQD on multiple, disparate databases across a network. While study-specific checks should still be run, we recommend that all data holders converting their data to the OMOP CDM use the DQD as it ensures conformance to the model specifications and that a database meets a baseline level of completeness and plausibility for use in research.</p
- …